Tag
11 articles
Alibaba's Qwen team has released Qwen3.5 Omni, a native multimodal model capable of processing text, audio, video, and real-time interaction. Positioned as a competitor to Google's Gemini 3.1 Pro, the model marks a significant step forward in multimodal AI architecture.
Explore the advanced technical features of Google's Gemini 3.1 Flash Live, a real-time multimodal voice model designed for low-latency audio and video interactions.
Researchers are exploring how to build vision-guided web AI agents using the MolmoWeb-4B model, which interprets screenshots to navigate and interact with websites without relying on HTML parsing.
Finance leaders are leveraging multimodal AI to automate complex workflows, overcoming traditional document processing limitations. These advanced AI frameworks are transforming how financial institutions handle unstructured data.
Learn how Mistral AI's new Mistral Small 4 model unifies instruction following, reasoning, and multimodal capabilities into one powerful AI system using Mixture of Experts technology.
Learn how GLM-OCR, a new AI model from Zhipu AI, helps convert complex documents into digital data by reading text, understanding layout, and extracting key information.
Google introduces Gemini Embedding 2, a multimodal embedding model that unifies text, images, video, audio, and documents into a single vector space, streamlining AI development and performance.
OpenAI employees and leaked project details suggest the company is developing a new omni model, a multimodal AI system that could unify text, audio, and image processing.
This article explains how Meta and NYU researchers are exploring unlabeled video data as a new training frontier for multimodal AI models, challenging conventional assumptions about model architecture and data prioritization.
Learn about Phi-4-Reasoning-Vision-15B, a new AI model from Microsoft that combines image and text understanding to solve math problems, interpret science content, and navigate user interfaces.
Google AI announced significant updates to its Gemini 3.1 Pro model and introduced the new Nano Banana 2, showcasing advancements in both high-performance and edge-optimized AI capabilities.